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Abstract 

If constraints are imposed on a macromolecule, two inequivalent classi- 
cal models may be used: the stiff and the rigid one. This work studies the 
effects of such constraints on the Conformational Equilibrium Distribution 
(CED) of the model dipeptide HCO-L-Ala-NHb without any simplifying 
assumption. We use ab initio Quantum Mechanics calculations including 
electron correlation at the MP2 level to describe the system, and we mea- 
sure the conformational dependence of all the correcting terms to the naive 
CED based in the Potential Energy Surface (PES) that appear when the 
constraints are considered. These terms are related to mass-metric ten- 
sors determinants and also occur in the Fixman's compensating potential. 
We show that some of the corrections are non-negligible if one is inter- 
ested in the whole Ramachandran space. On the other hand, if only the 
energetically lower region, containing the principal secondary structure 
elements, is assumed to be relevant, then, all correcting terms may be ne- 
glected up to peptides of considerable length. This is the first time, as far 
as we know, that the analysis of the conformational dependence of these 
correcting terms is performed in a relevant biomolecule with a realistic 
potential energy function. 

PACS: 87.14.Ee, 87.15.-v, 87.15.Aa, 87.15. Cc, 89.75.-k 
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1 Introduction 



In computer simulations of large complex systems, such as macromolecules 
and, specially, proteins [1-6], one of the main bottlenecks to design efficient 
algorithms is the necessity to sample an astronomically large conformational 
space [3,7]. In addition, being the typical timescales of the different movements 
in a wide range, demandingly small timesteps must be used in Molecular Dy- 
namics simulations in order to properly account for the fastest modes, which 
lie in the femtosecond range. However, most of the biological interesting be- 
haviour (allostcric transitions, protein folding, enzymatic catalysis) is related to 
the slowest conformational changes, which occur in the timcscalc of milliseconds 
or even seconds [4,8-11]. Fortunately the fastest modes are also the most en- 
ergetic ones and are rarely activated at room temperature. Therefore, in order 
to alleviate the computational problems and also simplify the images used to 
think about these elusive systems, one may naturally consider the reduction of 
the number of degrees of freedom describing macromolecules via the imposition 
of constraints [12]. 

How to study the conformational equilibrium of these constrained systems 
has been an object of much debate [13-17]. Two different classical models exist 
in the literature which are conceptually [13-16,18,19] and practically [6,13,20- 
24] inequivalent. In the classical rigid model, the constraints are assumed to be 
exact and all the velocities that are orthogonal to the hypersurface defined by 
them vanish. In the classical stifft}\ model, on the other hand, the constraints are 
assumed to be approximate and they are implemented by a steep potential that 
drives the system to the constrained hypersurface. In this case, the orthogonal 
velocities are activated and may act as "heat containers" . 

In this work, we do not address the question of which model is a better 
approximation of physical reality. Although, in the literature, it is commonly 
assumed (often implicitly) that the classical stiff model should be taken as a ref- 
erence [6,9,16,19,20,22,26], we believe that this opinion is much influenced by 
the use of popular classical force fields [6,27-37] (which are stiff by construction) 
and by the goal of reproducing their results at a lower computational cost, i.e., 
using rigid Molecular Dynamics simulations [4,5,8,9,14,19,21-23,25,26,38^2]. 
In our opinion, the question whether the rigid or the stiff model should be used 
to approximate the real quantum mechanical statistics of an arbitrary organic 
molecule has not been satisfactorily answered yet. For discussions about the 
topic, see references [13-15,17,18,43-45]. In this work, we adopt the cautious 
position that any of the two models may be useful in certain cases or for certain 
purposes and we study them both on equal footing. Our concern is, then, to 
study the effects that either way of imposing constraints causes in the confor- 
mational equilibrium of macromolecules. 

In the Born-Oppcnhcimcr approximation [46] customarily used in Quantum 
Mechanics and in the majority of the classical force fields, the relevant degrees 

'Some authors use the word flexible to refer to this model [15,21,22,25]. We, however, 
prefer to term it stiff [18] and keep the name flexible to refer to the case in which no constraints 
are imposed. 
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of freedom are the Euclidean (also called Cartesian by some authors) 3n co- 
ordinates of the n nuclei. However, it is frequent to define a different set of 
coordinates in which the overall translation and rotation of the system are dis- 
tinguished and the remaining 3n — 6 degrees of freedom are chosen (according to 
different prescriptions as internal coordinates, which are simple geometrical pa- 
rameters (typically consisting of bond lengths, bond angles and dihedral angles) 
that describe the internal structure of the system [47] . 

In macromolecules, the natural constraints are those derived from the rela- 
tive rigidity of the internal covalent structure of groups of atoms that share a 
common center (and also from the rigidity of rotation around double or triple 
bonds) compared to the energetically "cheaper" rotation around single bonds. 
In internal coordinates, these chemical constraints may be directly implemented 
by asking that some conveniently selected hard coordinates (normally, bond 
lengths, bond angles and some dihedrals) have constant values or values that 
depend on the remaining soft coordinates (see ref. [15] for a definition). In 
Euclidean coordinates, on the other hand, the expression of the constraints is 
more cumbersome and complicated procedures [25,26,40,48-50] must be used 
at each timestep to implement them in Molecular Dynamics simulations. This 
is why, in the classical stiff model, as well as in the rigid one, it is common to 
use internal coordinates and they are also the choice throughout this work. 

In the equilibrium Statistical Mechanics of both the stiff and rigid models, 
the marginal probability density in the coordinate part of the phase space in 
these internal coordinates is not proportional to the naive exp[— /3Vs(V)], where 
Vs(q l ) denotes the potential energy on the constrained hypersurfacqj. Instead, 
some correcting terms that come from different sources must be added to the 
potential energy Vs(q z ) [13, 15, 18, 19,39,51,52]. These terms involve determi- 
nants of mass-metric tensors and also of the Hessian matrix of the constraining 
part of the potential (see sec. [2]). If Monte Carlo simulations in the coordinate 
space are to be performed [5, 53-57] and the probability densities that corre- 
spond to any of these two models sampled, the corrections should be included 
or, otherwise, showed to be negligible. 

Additionally, the three different correcting terms are involved in the defini- 
tion of the so-called Fixman's compensating potential [16], which is frequently 
used to reproduce the stiff equilibrium distribution using rigid Molecular Dy- 
namics simulations [9,14,19, 21-23, 38, 39, 42, 51] . 

Customarily in the literature, some of these corrections to the potential 
energy are assumed to be independent of the conformation and thus dropped 
from the basic expressions. Also, subtly entangled to the assumptions underly- 
ing many classical results, a second type of approximation is made that consists 
of assuming that the equilibrium values of the hard coordinates do not depend 
on the soft coordinates. 

In this work, we measure the conformational dependence of all correcting 
terms and of the Fixman's compensating potential in the model dipeptide HCO- 

2 By q', we denote the soft internal coordinates of the system. See sec.[2]and the Appendix 
for a precise definition. 



3 



L-Ala-NH2 without any simplifying assumption. The potential energy function is 
considered to be the effective Born-Oppenheimer potential for the nuclei derived 
from ab initio quantum mechanical calculations including electron correlation 
at the MP2 level. We also repeat the calculations, with the same basis set (6- 
31++G(d,p)) and at the Hartree-Fock level of the theory in order to investigate 
if this less demanding method without electron correlation may be used in fur- 
ther studies. It is the first time, as far as we are aware, that this type of study is 
performed in a relevant biomolecule with a realistic potential energy function. 

In sec. [5J we introduce the notation to be used and derive the Statistical 
Mechanics formulae of the rigid and stiff models in the general case. In sec. [3J 
we describe the computational methods used and we summarize the factoriza- 
tion of the external coordinates presented in ref. [58]. Sec. [4] is devoted to the 
presentation and discussion of the assessment of the approximation that con- 
sists of neglecting the different corrections to the potential energy in the model 
dipeptide HCO-L-Ala-NH2, without any simplifying assumption, which is the 
central aim of this work. The conclusions are summarized in sec. [5] Finally, in 
the appendix, we discuss the use of the different approximations in the literature 
and we give a precise definition of exactly and approximately separable hard and 
soft coordinates which will shed some light on the relation between the different 
types of simplifications aforementioned. 

2 Theory 

First of all, it is convenient to introduce certain notational conventions that will 
be used extensively in the rest of the work: 

• The system under scrutiny will be a set of n mass points termed atoms. 
The Euclidean coordinates of the atom a in a set of axes fixed in space 
are denoted by x a . The subscript a runs from 1 to n. 

• The curvilinear coordinates suitable to describe the system will be denoted 
by q^ , fj, = 1, . . . , 3n and the set of Euclidean coordinates by x M when no 
explicit reference to the atoms index needs to be made. We shall often 
use N := in for the total number of degrees of freedom. 

• The coordinates are split into (q A ,q a ), a = 7,...,N. The first six 
are termed external coordinates and are denoted by q A . They describe 
the overall position and orientation of the system with respect to a frame 
fixed in space (see ref. 58 for further details). The coordinates q a are said 
internal coordinates and determine the positions of the atoms in the frame 
fixed in the system. They parameterize what we shall call the internal 
subspace or conformational space, denoted by X and the coordinates q A 
parameterize the external subspace, denoted by £. 

• The general set-up of the problem may be described as follows: Instead 
of us being interested on the conformational equilibrium of the system in 
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the external subspace £ plus the whole internal subspace X (i.e., the whole 
space, denoted by £ x X), we wish to find the probability density on a 
hypersurface Sclof dimension M (plus the external subspace £), i.e., 
on £ x S. 

• In typical internal coordinates q a , normally consisting of bond lengths, 
bond angles and dihedral angles (see ref. 59 and references therein), the 
hypersurface £ is described via L := N — M — 6 constraints: 

r/ = /V) I = M + 7,...,N, (2.1) 

where the q a are split into q a = (q^q 1 ), and the q l , i = 7, . . . ,M + 6, 
which parameterize S, are called internal soft coordinates, whereas the q 1 
are termed hard coordinates. The external coordinates q A , together with 
the q % , form the whole set of soft coordinates, denoted by q u = (q A ,q l ), 



u = l,.. 


.,M-f 


6. 








In table [fl 


a summary 


of the indices used is g 


;iven. 


Indices 


Ranj 






Number 


Description 


a,/?, 7, . . . 


1,.. 


,71 




n 


Atoms 


/!, V,p, ... 


1,.. 


,N 




N = 3n 


All coordinates 


A,B,C,... 


1,.. 


,6 




6 


External coordinates 


a,b,c, . . . 


7,.. 


,W 




iV-6 


Internal coordinates 


i, j, k, . . . 


7,.. 


,M- 


h6 


M 


Soft internal coordinates 


I,J,K,... 


M + 


7,... 




L = N - M - 6 


Hard internal coordinates 


U, V, w, . . . 


1,.. 


,M- 


h 6 


M + 6 


All soft coordinates 



Table 1: Definition of the indices used. 



2.1 Classical stiff model 

In the classical stiff model, the constraints in eq. 12. H are implemented by impos- 
ing an strong energy penalization when the internal conformation of the system, 
described by q a , departs from the constrained hypersurface S. To ensure this, 
we must have that the potential energy function in X satisfies certain conditions. 
First, we write the potential V{q a ) as follow^: 

V(q i ,q I ) = V(q i ,f I (q i )) + \v(q\q I )-V(q i ,f I (<f))l . (2.2) 



W) V c {qW) 
Next, we impose the following conditions on the constraining potential 
V c {q 1 ,q I ) defined above: 



3 Note that we have simply added and subtracted from the total potential energy V(q l , q 1 ) 
V(q a ) of the system the same quantity, VUf, f I (q')). 
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(i) That V^q* , f 1 (q*)) < V^q^q 1 ) \lq\q 1 ", i.e., that E be the global mini- 
mum of V" c (and, henceforth, a local one too) with respect to variations of 
the hard coordinates. 

(ii) That, for small variations Aq 1 on the hard coordinates (i.e., for changes 
Aq 1 considered as physically irrelevant), the associated changes in V c (q l , q 1 ) 
are much larger than the thermal energy RT. 

The advantages of this formulation, much similar to that on [15], are many. 
First, it sets a convenient framework for the derivation of the Statistical Me- 
chanics formulae of the classical stiff model relating it to the fully flexible model 
in the whole space £ x T. Second, it clearly separates the potential energy on 
E from the part that is responsible of implementing the constraints. Third, 
contrarily to the formulation based on delta functions [51], it allows to clearly 
understand the necessity of including the correcting term associated to the de- 
terminant of the Hessian of V c (see the derivation that follows). Finally, and 
more importantly for us, it provides a direct prescription for calculating V^(q l ) 
and E (the Potential Energy Surface (PES), frequently used in Quantum Chem- 
istry calculations [60 64]) via geometry optimization at fixed values of the soft 
coordinates. 

We also remark that, in order to satisfy point (ii) above and to allow the 
derivation of the different correcting terms that follows and the validity of the 
final expressions, the hard coordinates q 1 must be indeed hard, however, the soft 
coordinates q l do not have to be soft (in the sense that they produce energetic 
changes much smaller than RT when varied) . They may be interesting for some 
other reason and hence voluntarily picked to describe the system studied, with- 
out altering the formulae presented in this section. Despite this qualifications, 
the terms soft and hard will be kept in this work for consistence with most 
of the existing literature [15,18,52,65,66], although, in some cases, the labels 
important and unimportant (for q % and q 1 respectively), proposed by Karplus 
and Kushick [67], may be more appropriate. 

In the case of the model dipeptide HCO-L-Ala-NH 2 investigated in this work, 
for example, the barriers in the Ramachandran angles <f> and ip may be as large as 
~ 40 RT, however, the study of small dipeptides is normally aimed to the design 
of effective potentials for polypeptides [68-70] , where long-range interactions in 
the sequence may compensate these local energy penalizations. This and the fact 
that the Ramachandran angles are the relevant degrees of freedom to describe 
the conformation of the backbone of these systems, make it convenient to choose 
them as soft coordinates q l despite the fact that they may be energetically hard 
in the case of the dipeptide HCO-L-Ala-NH2. As remarked above, this does not 
affect the calculations. 

Now, due to condition (ii) above, the statistical weights of the conforma- 
tions which lie far away from the constrained hypersurface E are negligible and, 
therefore, it suffices to describe the system in the vicinity of the equilibrium 
values of the q 1 . In this region, for each value of the internal soft coordinates q 1 , 
we may expand V c (q l ,q I ) in eq. 12.21 up to second order in the hard coordinates 
around E (i.e., around q 1 = f I (q 1 )) and drop the higher order terms: 
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V c (qW) * V c {q i ,f I (q i )) + 




dV c 



/V)) 



(2.3) 



where the subindex S indicates evaluation on the constrained hypersurface 
and a more compact notation, H(q 1 '), has been introduced for the Hessian matrix 
of V c with respect to the hard variables evaluated on S. Also, the Einstein's 
sum convention is assumed on repeated indices. 

In this expression, the zeroth order term V c (q l , f I (q 1 )) is zero by definition 
of V c (see eq. 12. 2\ and the linear term is also zero, because of the condition (i) 
above. Hence, the first non-zero term of the expansion in eq. 12.31 is the second 
order one. Using this, together with eq. 12.21 we may write the stiff Hamiltonian 



W, Pll ) := ^P,G- p (q U ,q I ) Pp + Vj:(q l ) + 



+ ^H,, K W){q J - f J W)){q K - f K {tf)) 



the mass-metric tensor G vp being 



n (a u dx °^ 



dq f 



and G vp its inverse, defined by 



G™{q u ,q I )G ap {q u ,q I ) = 5 v p , 

where 8 v p denotes the Kronecker's delta. 

Therefore, the stiff partition function of the system isQ 



a QM 
h N 



J d^dp^cxpf-/?^ 1 ,^)] , 



(2.4) 



(2.5) 



(2.6) 



(2.7) 



where h is Planck's constant, we denote (3 := 1/RT (per mole energy units 
are used throughout the article, so RT is preferred over fc^T) and aQM is a 
combinatorial number that accounts for quantum indistinguishability and that 
must be specified in each particular case (e.g., for a gas of N indistinguishable 
particles, agM = 

Now, using the condition (ii) again, the q 1 appearing in the mass-metric 
tensor G in H s (in eq. 12. 7j) can be approximately evaluated at their equilibrium 
values f 1 (q % ), yielding, for the stiff partition function, 



4 No Jacobian appears in the integral measure because q M and are obtained from the 
Euclidean coordinates via a canonical transformation [71]. 
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z s = 



Ot-QM 
h N 



J dq^q'dp.exp -(3(^p„G" p (q u J I (q l ))p P + 



+ W) + ^n JK (q l )(q J - / J («*)) («* ~ f K (q 1 )) 



If we now integrate over the hard coordinates q 1 . we have 



(2.8) 



27r^ 2 Oqm 

P 



h N 



j d<fd^exp -p(^PuG vp (q u J I (q i ))p p + 



.R 



W)+T-ln detH(q l ) 



(2.9) 



where the part of the result of the Gaussian integral consisting of det ' TL 
has been taken to the exponent. 

Note that the Hessian matrix TLjk involves only derivatives with respect to 
the hard coordinates (see eq. 12. 3|) , so that the minimization protocol embodied 
in eq. 12.21 (which is identical to the procedure followed in Quantum Chemistry 
for computing the PES along reaction coordinates) guarantees that Hjk is 
positive defined and, hence, det Ti. is positive, allowing to take its logarithm 
as in the previous expression. The fact that it is only this 'partial Hessian' 
that makes sense in the computation of equilibrium properties along soft (or 
reaction) coordinates, has been recently pointed out in ref. 72. 

It is also frequent to integrate over the momenta in the partition function. 
Doing this in eq. 12.91 and taking the determinant of the mass- metric tensor that 
shows urjfl to the exponent, we may write the partition function as an integral 
only on the coordinates: 



Xs(T) I d g «exp 
B 



Vx(q i )+ T^\n[dctH(q l ) 



" T 2 ln 



detGfc/W)) 



(2.10) 



where the multiplicative factor that depends on T has been defined as follows: 

N + L 

(2.11) 



Xs(T) := \% 



P 



OLQM 



h N 



If the exponent in eq. 12.101 is seen as a free energy, then, V?,(q l ) may be 
regarded as the internal energy and the two conformation-dependent correcting 



5 Note that, by G, we denote the matrix that corresponds to the mass-metric tensor with 
two covariant indices G^v. The same convention has been followed for the Hessian matrix H 
in eq. I2.9l and for the reduced mass-metric tensor g in eq. 12.211 
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terms that are added to it as effective entropies (which is compatible with their 
being linear in RT). The second one comes only from the desire to write the 
marginal probabilities in the coordinate space (i.e., averaging the momenta) 
and may be called a kinetic entropy [17], the first term, on the other hand, is 
truly an entropic term that comes from the averaging out of certain degrees of 
freedom and it is reminiscent of the conformational or configurational entropies 
appearing in quasiharmonic analysis [6,67,73]. 
In this spirit, we define 



F 8 (<f) :=Vx(q l )-T{SC( q >) + SHq U )) 
W) :=-|ln[dctH(^)] , 
S s V):=f \n\dct G(q u J 1 (q 1 )) . 



(2.12a) 
(2.12b) 

(2.12c) 



In such a way that the stiff equilibrium probability in the soft subspace £ x £ 
is given by 



P,m= V z' ' With Z::=Jd q ^exp[-m(q a )}- (2-13) 

Now, it is worth remarking that, although the kinetic entropy depends on 
the external coordinates q A , we have recently shown [58] that the determinant 
of the mass-metric tensor G may be written, for any molecule, general internal 
coordinates and arbitrary constraints, as a product of two functions; one de- 
pending only on the external coordinates, and the other only on the internal 
ones q a . Hence the externals-dependent factor in cq. I2.12cl may be integrated 
out independently to yield an effective free energy and a probability density P s 
that depend only on the soft internals q % (see sec. 13. ip . 



2.2 Classical rigid model 

If the relations in cq. 12.11 are considered to hold exactly and are treated as 
holonomic constraints, the Hamiltonian function that describes the Classical 
Mechanics in the subspace {£ X E) C {£ X X), spanned by the coordinates q u , 
may be written as follows: 

H r (q u ,Vu) := \w9 vw {q u )v W + W) , (2-14) 

where the reduced mass-metric tensor g V w{q u ) in £ x E, that appears in the 
kinetic energy, is (see what follows) 



g vw {<?) = G vw (q u ,f I (q i )) + ^^GjKiq^fW))* 



Q q v J ^ >■> V* H Q q w 

and g vw (q u ) is defined to be its inverse in the sense of eq. 12.61 Also, the 
notation 



p.J?" if u:=a* = 1,...,M + 6 

if I:=V = M + 7,...,N 



has been introduced for convenience. 

Note that eq. l2.15l mav derived from the unconstrained Hamiltonian in {£ x 



H(tf, Pli ) := - Pv G v »{q») Pp + V(q a ) , (2.17) 



1 

2 ] 

using the constraints in eq. 12.11 together with its time derivatives (denoted 
by an overdot: as in A) 



and defining the momenta r\ v as 

Vv ~9vU<l u )Q W =9U<f)G w ' 1 (q u J I (q i ))p tl . (2.19) 
Hence, the rigid partition function is 



-P[^vg vw (q u )Vv + V^q l ) 



(2.20) 



Integrating over the momenta, we obtain the marginal probability density 
in the coordinate space analogous to eq. 12.101 



Z T = X r(T) j dq u e^[-^(y s (q i ) 
where 



T^ln 
2 



detg(q u ) 



XAT) := ( j 



M+e 

2tt \ 2 CtQM 



(2.21) 



(2.22) 
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Repeating the analogy with free energies and entropies in the last paragraphs 
of the previous subsection, we define 



F r (g"):=y E (^)-TS r V) , 
S*(q u ):= | In [dot g(q u ) , 



(2.23b) 



(2.23a) 



being the rigid equilibrium probability in the soft subspace £ x £ 



cxp[~(3F(q u )] 

z! 



with Z. 




dq u exp[-/3F r (q u )] . (2.24) 



As in the case of G, we have shown in ref. [58] that the determinant of 
the reduced mass-metric tensor g may be written, for any molecule, general 
internal coordinates and arbitrary constraints, as a product of two functions; 
one depending only on the external coordinates, and the other only on the 
internal ones q l . Hence the externals-dependent factor in det g{q u ) may be 
integrated out independently to yield a free energy and a probability density P r 
that depend only on the soft internals q 1 (see sec 13. ip . 

To end this subsection, we remark that it is frequent in the literature [9,18,19, 
21-23,38,42,51,57] to define the so-called Fixman's compensating potential [16] 
as the difference between F s (q u ), in eq. 12.121 and F r (q u ), defined above, i.e., 



Hence, performing rigid Molecular Dynamics simulations, which would yield 
an equilibrium distribution proportional to exp[— f3F I (q u )], and adding Vp(q u ) 
to the potential energy V^(q l ) one can reproduce instead the stiff probability 
density P s cx exp[-(3F s (q u )] [14,18,19,21-23,38,39,42,51]. This allows to 
obtain at a lower computational cost (due to the timescale problems discussed 
in the introduction) equilibrium averages that otherwise must be extracted from 
expensive fully flexible whole-space simulations. In fact, it seems that this 
particular application of the theoretical tools herein described, and not the 
search for the correct probability density to sample in Monte Carlo simulations, 
was what prompted the interest in the study of mass-metric tensors effects. 

Finally, in table [2] we summarize the equilibrium probability densities and 
the different correcting terms derived in this section. 



TS*(q u )-TSC(q*)-TS*(q u ) 
RT det G(q u ) 



(2.25) 



2 dctH(q l ) det g(q u ) 
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Classical Stiff Model 



Classical Rigid Model 



Ps(q u ) 



exp[- f3F s {q u ) 



exp[-/?F r (g u ; 



F s (q u ) := Vfc(g<) - T(S^(q l ) + S s k (g 11 )) W) := V^{q l ) - TS*(q u ) 



i? 



detG(g",/V)) 
In detH(<^ 



S r V) :=~ln det 5 (g") 



Table 2: Equilibrium probability densities and correcting terms to the potential energy 
Ve(? 8 ) in the classical stiff and rigid models of constraints. 



3 Methods 

3.1 Factorization of the external coordinates 

In the recent work [58] , we have shown that the determinant of the mass-metric 
tensor G in eq. I2.12cl can be written as follows if the SASMIC [59] coordinates 
for general branched molecules are used: 



det G = TT ml sm 2 9 TTrJ TT sm 2 6 a , (3.1) 




where the r a are bond lengths and the 6 a bond angles. 

Note that this expression, whose validity was proved for the more particular 
case of serial polymers by Go and Scheraga [15] and, before, by Volkenstein [74], 
does not explicitly depend on the dihedral angles. However, it may depend on 
them via the hard coordinates if the constraints in the form presented in eq. 12.11 
are used. 

The term depending on the masses of the atoms in the expression above may 
be dropped from eq. I2.12c[ because it does not depend on the conformation, and 
the only part of detG that depend on the external coordinates, sin 2 #, may be 
integrated out in eq. 12.101 (8 is one of the externals q A that describe the overall 
orientation of the molecule; see ref. 58 for further details). Hence, the kinetic 
entropy due to the mass-metric tensor G in the stiff case, may be written, up 
to additive constants, as 



= f 



(3.2) 



^ln(^)+^ln(sin 2 o 

_a=2 a=3 

where the individual contributions of each degree of freedom have been fac- 
torized. 

Also in reference [58] , we have shown that the determinant of the reduced 
mass-metric tensor g in eq. I2.23bl can be written as follows: 
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det g = sin 2 9 det 32(9*) j 

being the matrix gi 



(3.3) 



92 



m tot / (3) m tot v(R) 

m tot v T {R) J 



dR 



mtot- 



\ 



mtot 



dR 

dqi 



a aq 1 oqi 



(3.4) 



/ 



where the superindex T indicates matrix transposition, I*- 3 -* denotes the 3x3 
identity matrix and x' a is the position of atom a in the reference frame fixed in 
the system (the 'primed' reference frame). 

Additionally, we denote the total mass of the system by mtot '■= J2a m °f> ^ ne 
position of the center of mass of the system in the primed reference frame by 
R := m^Jj. ^2 a m a x' a and the inertia tensor of the system, also in the primed 
reference frame, by 



J:-- 



m a x a 
m a x' 



- J2o, ™a X L X 'a" - E)o 

£ o «>a((O a +0O a ) "Eo 



7Tl a X a 

m a x' 



(3.5) 



The matrix v(R) is defined as: 



v(R) := 






-R 3 


i? 2 


i? 3 





-R 1 


-R 2 


i? 1 






(3.6) 



and x denotes the usual vector cross product. 

Then, since sin 2 may be integrated out in eq. 12.211 we can write, omitting 
additive constants, the kinetic entropy associated to the reduced mass-metric 
tensor g depending only on the soft internals q l : 



= ! In 



(3.7) 



Finally, one may note that, since sin 2 # divides out in the second line of 
cq. 12.251 or, otherwise stated, eqs. 13.21 and 13.71 may be introduced in the first 
line, then the Fixman's potential is independent of the external coordinates as 
well. 
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3.2 Computational Methods 

In the particular molecule treated in this work (the model dipeptide HCO-L- 
Ala-NH 2 in fig. [T|), the formulae in the preceding sections must be used with 
M = 2, being the internal soft coordinates q l = (<fr, tp) the typical Ramachandran 
angles [75] (see tabled, the total number of coordinates TV = 48 and the number 
of hard internals L = 40. 

Regarding the side chain angle x, h has been argued elsewhere [59] that it is 
soft with the same right as the angles <f> and ip, i.e., the barriers that hinder the 
rotation on this dihedral are comparable to the ones existing in the Ramachan- 
dran surface. However, the height of these barriers is sufficient (~ 6-12 RT, see 
ref. [59]) for the condition (ii) in sec. l2.1l to hold and, therefore, its inclusion in 
the set of hard coordinates is convenient due to its unimportant character (see 
discussion in sec. l2.1|) . Moreover, to describe the behaviour associated to x with 
a probability density different from a Gaussian distribution (i.e., its potential 
energy different from an harmonic oscillator) , for example with the tools used in 
the field of circular statistics [76-78] , would severely complicate the derivation of 
the classical stiff model without adding any conceptual insight to the problem. 
In addition, although x is a periodic coordinate with threefold symmetry, the 
considerable height of the barriers between consecutive minima allows to make 
the quadratic assumption in eq. 12.31 at each equivalent valley and permits the 
approximation of the integral on x by three times a Gaussian integral. The mul- 
tiplicative factor 3 simply adds a temperature- and conformation-independent 
reference to the configurational entropy S£ in cq. I2.12bl 

The same considerations are applied to the dihedral angles, uiq and uj\ (see 
table [3]), that describe the rotation around the peptide bond, and the quadratic 
approximation described above can also be used, since the heights of the rotation 
barriers around these degrees of freedom are even larger than the ones in the 
case of x- 



Figure 1: Atom numeration of the protected dipeptide HCO-L-Ala-NHb. 
The ab initio quantum mechanical calculations have been done with the 
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package GAMESS [79] under Linux and in 3.20 GHz PIV machines. The co- 
ordinates used for the HCO-L-Ala-NH 2 dipeptide in the GAMESS input files 
and the ones used to generate them with automatic Perl scripts are the SAS- 
MIC coordinates introduced in ref. 59. They are presented in table [3] indicating 
the name of the conventional dihedral angles (see also fig. [1] for reference). To 
perform the energy optimizations, however, they have been converted to Delo- 
calizcd Coordinates [80] in order to accelerate convergence. 



Atom name Bond length Bond angle Dihedral angle 



Hi 








C 2 


(2,1) 






N 3 


(3,2) 


(3,2,1) 




O4 


(4,2) 


(4,2,1) 


(4,2,1,3) 


c 5 


(5,3) 


(5,3,2) 


co :=(5,3,2,1) 




(6,3) 


(6,3,2) 


(6,3,2,5) 


C 7 


(7,5) 


(7,5,3) 


0:^(7,5,3,2) 


Cs 


(8,5) 


(8,5,3) 


(8,5,3,7) 


H 9 


(9,5) 


(9,5,3) 


(9,5,3,7) 


Hio 


(10,8) 


(10,8,5) 


X :=(10,8,5,3) 


Hu 


(11,8) 


(11,8,5) 


(11,8,5,10) 


Hl2 


(12,8) 


(12,8,5) 


(12,8,5,10) 


N13 


(13,7) 


(13,7,5) 


yj :=(13,7,5,3) 


Ol4 


(14,7) 


(14,7,5) 


(14,7,5,13) 


Hi 5 


(15,13) 


(15,13,7) 


Wi :=(15, 13,7,5) 


Hi6 


(16,13) 


(16,13,7) 


(16,13,7,15) 



Table 3: SASMIC internal coordinates (Echenique P. and Alonso J. L., To be published 
in J. C'omp. Chem., |arXiv : q-bio . BM/051 1004 ) in Z-matrix form of the protected 
dipeptide HCO-L-Ala-NEb. Principal dihedrals are indicated in bold face and their 
typical biochemical name is given. 

First, we have calculated the typical Potential Energy Surface (PES) in 
a regular 12x12 grid of the bidimensional space spanned by the Ramachan- 
dran angles (f> and ip, with both angles ranging from —165° to 165° in steps 
of 30°. This has been done by running constrained energy optimizations at 
the MP2/6-31++G(d,p) level of the theory, freezing the two Ramachandran 
angles at each value of the grid, starting from geometries previously optimized 
at a lower level of the theory and setting the gradient convergence critcrium 
to OPTTOL=10~ 5 and the self-consistent Hartree-Fock convergence criterium to 
CONV=10~ 6 . 

The results of these calculations (which took ~ 100 days of CPU time) are 
144 conformations that define S and the values of Ve(0, i/j) at these points (the 
PES itself). 

Then, at each optimized point of S, we have calculated the Hessian matrix 
in the coordinates of table [3] removing the rows and columns corresponding to 
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the soft angles <\> and ijj, the result being the matrix Ti.((j),ip) m eq. I2.12bl This 
has been done, again, at the MP2/6-31++G(d,p) level of the theory, taking 
- 140 days of CPU time. 

Eqs. 13.21 and 13.71 in sec. 13.11 have been used to calculate the kinetic en- 
tropy terms associated to the determinants of the mass-metric tensors G and 
g, respectively. The quantities in eq. 13.21 being simply internal coordinates, 
have been directly extracted from the GAMESS output files via automated Perl 
scripts. On the other hand, in order to calculate the matrix g 2 in eq. 13.41 that 
appears in the kinetic entropy of the classical rigid model, the Euclidean coor- 
dinates x' a of the 16 atoms in the reference frame fixed in the system, as well as 
their derivatives with respect to q l = -0) , must be computed. For this, two 
additional 12x12 grids as the one described above have been computed; one of 
them displaced 2° in the positive ^-direction and the other one displaced 2° in 
the positive ^-direction. This has been done, again, at the MP2/6-31+- |-G(d,p) 
level of the theory, starting from the optimized structures found in the com- 
putation of the PES described above and taking ~ 75 days of CPU time each 
grid. Using the values of the positions x' a in these two new grids and also in 
the original one, the derivatives of these quantities with respect to the angles <j> 
and ip, appearing in g 2 , have been numerically obtained as finite differences. 

The three calculations have been repeated for six special points in the Ra- 
machandran space that correspond to important elements of secondary struc- 
ture (see sec. 2]), the total CPU time needed for computing all correcting terms 
at these points has been ~ 16 days. A total of ~ 406 days of CPU time has 
been needed to perform the whole study at the MP2/6-31+- |-G(d,p) level of the 
theory. 

Finally, we have repeated all the calculations at the HF/6-31++G(d,p) level 
of the theory in order to investigate if this less demanding method (~ 10 days 
for the PES, ~ 8 days for the Hessians, ~ 10 days for each displaced grid, ~ 2 
days for the special secondary structure points, being a total of ~ 40 days of 
CPU time) may be used instead of MP2 in further studies. 

4 Results 

In table [H the maximum variation, the average and the standard deviation in 
the 12x12 grid defined in the Ramachandran space of the protected dipeptide 
HCO-L-Ala-NH 2 are shown for the three energy surfaces, Vs, F B and F r (see 
eqs.[2IE2]andCIlI]), for the three correcting terms, -T5 s k , -T5 S C , and -TS* and 
for the Fixman's compensating potential Up (see eq. I2.25[) . All the functions 
have been referenced to zero in the grid. 

In fig. [2] the Potential Energy Surface Us, at the MP2/6-31++G(d,p) level 
of the theory, is depicted with the reference set to zero for visual convenienc^l 
Neither the surfaces defined by F s and F r at the MP2/6-31++G(d,p) level of 

6 At the level of the theory used in the calculations, the minimum of Vj:(0, VO in the grid 
is -416.0733418995 hartree. 
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MP2/6-31++G(d,p) HF/6-31++G(d,p) 
Max. a Ave. 6 Std. c Max. a Ave. 6 Std. c 



21.64 
21.43 
21.09 



6.76 
6.47 
6.46 



3.88 
3.93 
3.82 



23.62 
23.78 
23.09 



6.92 
7.17 
6.76 



4.35 
4.38 
4.31 



TS r k 

V F 



0.24 
1.67 
0.81 
1.68 



0.09 
0.98 
0.37 
0.89 



0.05 
0.32 
0.12 
0.30 



0.23 
1.34 
0.75 
1.35 



0.09 
0.63 
0.38 
0.55 



0.04 
0.30 
0.12 
0.27 



Table 4: "Maximum variation, 'average and Standard deviation in the 12x12 grid 
defined in the Ramachandran space of the protected dipeptide HCO-L-Ala-NH2 for 
the three energy surfaces, Vs, F B and F T , the three correcting terms, —TS B , —TS^, 
and —TS^ and the Fixman's compensating potential Vf- The results at both MP2/6- 
31++G(d,p) and HF/6-31++G(d,p) levels of the theory are presented and all the 
functions have been referenced to zero in the grid. The units used are kcal/mol. 

the theory nor the three energy surfaces Vs, F s and F r at HF/6-31++G(d,p) 
are shown graphically since they are visually very similar to the surface in fig. [5J 

In fig. [31 the three correcting terms, — TS 1 ^, — TS£ and — TS^ and the Fix- 
man's compensating potential Vf, at the MP2/6-31++G(d,p) level of the the- 
ory, are depicted with the reference set to zero. The analogous surfaces at the 
HF/6-31++G(d,p) level of the theory are visually very similar to the ones in 
fig. [3] and have been therefore omitted. 

From the results presented, one may conclude that, although the conforma- 
tional dependence of the correcting terms — T5 k , —TS^ and —TS^ is more than 
an order of magnitude smaller than the conformational dependence of the Poten- 
tial Energy Surface Vs in the worst case, if chemical accuracy (typically defined 
in the field of ab initio quantum chemistry as 1 kcal/mol [81]) is sought, they 
may be relevant. In fact, they are of the order of magnitude of the differences 
between the energy surfaces Vs, F s and F T calculated at MP2/6-31++G(d,p) 
and the ones calculated at HF/6-31++G(d,p). 

For the same reasons, we may conclude that, if ab initio derived potentials 
are used to carry out Molecular Dynamics simulations of peptides, the Fixman's 
compensating potential Vf should be included. Finally, regarding the relative 
importance of the different correcting terms —TS^, —TS^ and —TS^, the results 
in table [4] suggest that the less important one is the kinetic entropy —TS^ of the 
stiff case (related to the determinant of the mass-metric tensor G) and that the 
most important one is the one related to the determinant of the Hessian matrix 
TL of the constraining part of the potential, i.e., the conformational entropy 
—TS^. The first conclusion is in agreement with the approximations typically 
made in the literature, the second one, however, is not (see the Appendix). 
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v E H>,y) 




Figure 2: Potential Energy Surface (PES) of the model dipeptide HCO-L-Ala-NH 2 , 
computed at the MP2/6-3H — hG(d,p) level of the theory. The surface has been refer- 
enced to zero and smoothed with bicubic splines for visual convenience. The units in 
the z-axis are kcal/mol. 




Figure 3: Ramachandran plots of the correcting terms appearing in eqs. l2.12l and l2.23l 
together with the Fixman's compensating potential defined in eq. 12.251 computed at 
the MP2/6-31++G(d,p) level of the theory in the model dipeptide HCO-L-Ala-NH 2 . 
The surfaces have been referenced to zero and smoothed with bicubic splines for visual 
convenience. The units in the z-axes are kcal/mol. 
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Now, although the relative sizes of the conformational dependence of the 
different terms may be indicative of their importance, the degree of correlation 
among the surfaces is also relevant (see table [5]) . Hence, in order to arrive to 
more precise conclusions, we reexamine here the results using a physically mean- 
ingful criterium to compare potential energy functions that has been introduced 
in ref. 82. The distance, denoted by di 2 , between any two different potential 
energy functions, V\ and V2, is an statistical quantity that, from a working set 
of conformations (in this case, the 144 points of the grid), measures the typi- 
cal error that one makes in the energy differences if V<x is used instead of V\, 
admitting a linear rescaling. 

In table El which contains the central results of this work, the distances 
between some of the energy surfaces that play a role in the problem are shown. 
We present the result in units of RT (at 300° K, where RT ~ 0.6 kcal/mol) 
because it has been argued in ref. 82 that, if the distance between two different 
approximations of the energy of the same system is less than RT, one may safely 
substitute one by the other without altering the relevant physical properties. 
Moreover, if one assumes that the effective energies compared will be used to 
construct a polypeptide potential and that it will be designed as simply the sum 
of mono-residue ones (making each term suitably depend on different pairs of 
Ramachandran angles), then, the number N Ies of residues up to which one may 
go keeping the distance between the two approximations of the the ./V-residue 
potential below RT is (see eq. 23 in ref. 82): 



This number is also shown in table [51 together with the slope 612 of the 
linear rescaling between V\ and V2 and the Pearson's correlation coefficient [83] , 
denoted by 1*12. 

The results at both MP2/6-31++G(d,p) and HF/6-31++G(d,p) levels of 
the theory are presented. The first three rows in each of the first two blocks are 
related to the classical stiff model, the next row to the classical rigid model and 
the last one in each block to the comparison between the two models. The third 
block in the table is associated to the comparison between the two different 
levels of the theory used. 

The F s vs. Vs row (in the first two blocks) assess the importance of the two 
correcting terms, —TS^ and —TS^, in the stiff case. The result d\2 = 0.74RT 
indicates that, for the alanine dipeptide, Vg may be used as an approxima- 
tion of F s with caution if accurate results are sought. In fact, the low value of 
iVres = 1.82 < 2 shows that, if we wanted to describe a 2-residue peptide omit- 
ting the stiff correcting terms, we would typically make an error greater than 
the thermal noise in the energy differences. The next two rows investigate the 
effect of each one of the individual correcting terms. The conclusion that can 
be extracted from them (as the relative sizes in table [4] already suggested) is 
that the conformational entropy associated to the determinant of the Hessian 
matrix 7i is much more relevant than the correcting term —TS^, related to the 




(4.1) 
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Corr. a V," V 2 C d 12 d N les e b 12 ? r 12 s 



MP2/6-31++G(d,p) 



-TS k - TS!: 


F s 




0.74 RT 


1.82 


0.98 


0.9967 




F s 




0.74 RT 


1.83 


0.98 


0.9967 


-TS k 


F s 


V* - TS C S 


0.11 RT 


80.45 


1.00 


0.9999 


-TS* r k 


F T 




0.29 RT 


11.62 


1.01 


0.9995 


Vf 


F s 


F? 


0.67 RT 


2.24 


0.97 


0.9972 






HF/6-31+ 


+G(d,p) 








-TS k - TS^ 


F s 




0.73 RT 


1.90 


0.99 


0.9975 


-TS* 


F s 


Vn - TS\ 


0.71 RT 


2.00 


0.99 


0.9976 




F s 


V* - TS C S 


0.10 RT 


90.99 


1.00 


0.9999 


-TS* r k 


F T 


Vs 


0.26 RT 


14.83 


1.01 


0.9997 


Vf 


F s 


F r 


0.61 RT 


2.69 


0.98 


0.9982 



MP2/6-31++G(d,p) vs. HF/6-31++G(d,p) 





Vs 


1.25 RT 


0.64 


1.12 


0.9925 


F s 




1.18 RT 


0.72 


1.11 


0.9934 


F T 


F r 


1.18 RT 


0.72 


1.12 


0.9932 



Table 5: Comparison of different energy surfaces involved in the study of the con- 
strained equilibrium of the protected dipeptide HCO-L-Ala-NEb. "Correcting term 
whose importance is measured in the corresponding row, ^reference potential energy 
V\ (the "correct" one, the one containing the correcting term), Approximated poten- 
tial energy V2 (i.e, Vi minus the correcting term in column a), d statistical distance 
between Vi and V2 (see Alonso J. L. and Echenique P., J. C'omp. Chem. 27 (2006) 
238-252), 'maximum number of residues in a polypeptide potential up to which the 
correcting term in column a may be omitted, ■'slope of the linear rescaling between V\ 
and V2 and 9 Pearson's correlation coefficient. All quantities are dimensionless, except 
for di2 which is given in units of the thermal energy RT at 300° K. 

mass-metric tensor G, allowing to drop the latter up to ~ 80 residues (accord- 
ing to MP2/6-31++G(d,p) calculations). As has been already remarked, this 
second conclusion is in agreement with the approximations frequently done in 
the literature; however, it turns out that the importance of the Hessian-related 
term has been persistently underestimated (see the Appendix for a discussion). 

The F r vs. Vs row, in turn, shows the data associated to the kinetic entropy 
term —TS^, which is related to the determinant of the reduced mass-metric 
tensor g in the classical rigid model. From the results there (d\ 2 = 0.29RT and 
N Ies = 11.62 at the MP2/6-31++G(d,p) level), we can conclude that the only 
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MP2/6-31++G(d,p) 



Vz 


vs. 




0.1572 


V* 


vs. 


-TSs 


-0.0008 


Vz 


vs. 


-TS k 


-0.3831 


Vz 


vs. 


v F 


0.3334 






HF/6-31++G(d,p) 




Vz 


vs. 




0.0682 


Vz 


vs. 




0.0897 


Vz 


vs. 


-T5 r k 


-0.3544 


Vz 


vs. 


v F 


0.2404 



MP2/6-31++G(d,p) vs. HF/6-31++G(d,p) 

-TS° vs. -T5 S C 0.9136 

-TS k vs. -TS k 0.9808 

-TS k vs. -TS k 0.9316 

V F vs. V F 0.9217 



Table 6: Correlation between the different correcting terms involved in the study of 
the constrained equilibrium of the protected dipeptide HCO-L-Ala-NEh. "Reference 
potential energy, 6 approximated potential energy, c Pearson's correlation coefficient. 

correction term in the rigid case is less important than the ones in the stiff case 
and that Vz may be used as an approximation of F r for oligopeptides of up to 
~ 12 residues. 

The last row in each of the first two blocks in table [5] is related to the 
interesting question in Molecular Dynamics of whether or not one should include 
the Fixman's compensating potential Vp (see eq. 12.25ft in rigid simulations in 
order to obtain the stiff equilibrium distribution, exp(— [3F S ), instead of the 
rigid one, exp(—(3F T ). This question is equivalent to asking whether or not F T 
is a good approximation of F s . From the results in the table, we can conclude 
that the Fixman's potential is relevant for peptides of more than 2 residues and 
its omission may cause an error greater than the thermal noise in the energy 
differences. 

The appreciable sizes of the different correcting terms, shown in table |H 
together with their low correlation with the Potential Energy Surface Vs, pre- 
sented in the first two blocks of table [6j explain their considerable relevance 
discussed in the preceding paragraphs. 

Moreover, from the comparison of the MP2/6-31++G(d,p) and the HF/6- 
31++G(d,p) blocks, one can tell that the study herein performed may well have 
been done at the lower level of the theory (if we had known) with a tenth of the 
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computational effort (see sec. [3]). This fact, explained by the high correlation, 
presented in the third block of table [6l between the correcting terms calcu- 
lated at the two levels, is very relevant for further studies on more complicated 
dipcptides or longer chains and it indicates that the differences in size between 
the different correcting terms at MP2/6-31++G(d,p) and HF/6-31 ++G(d,p), 
which are presented in table 21 are mostly due to a harmless linear scaling ef- 
fect similar to the well-known empirical scale factor frequently used in ab initio 
vibrational analysis [84-86]. This view is supported by the data in the third 
block of table related to the comparison between the energy surfaces calcu- 
lated at MP2/6-31++G(d,p) and HF/6-31++G(d,p), where the slopes b 12 are 
consistently larger than unity. 

A last conclusion that may be extracted from the block labeled "MP2/6- 
31++G(d,p) vs. HF/6-31++G(d,p)" in table [5] is that the typical error in the 
energy differences (given by the distances d.12) produced when one reduces the 
level of the theory from MP2/6-31++G(d,p) to HF/6-31++G(d,p) is compara- 
ble (less than twice) to the error made if the most important correcting terms 
of the classical constrained models studied in this work are dropped. This is a 
useful hint for researchers interested in the conformational analysis of peptides 
with quantum chemistry methods [60-64, 70, 87] and also to those whose aim is 
the design and parametrization of classical force fields from ab initio quantum 
mechanical calculations [68-70]. 



(j> V 


a-helix 


-57 


-47 


3io-helix 


-49 


-26 


7r-helix 


-57 


-70 


polyprolinc II 


-79 


149 


parallel /3-sheet 


-119 


113 


antiparallel /3-sheet 


-139 


135 



Table 7: Ramachandran angles (in degrees) of some important secondary structure 
elements in polypeptides. Data taken from Lesk A. M., Introduction to Protein Ar- 
chitecture, Oxford University Press, Oxford, 2001. 

Finally, in order to enrich and qualify the analysis, a new working set of 
conformations, different from the 144 points of the grid in the Ramachandran 
space, have been selected and the whole study has been repeated on them. 
These new conformations are six important secondary structure elements which 
form repetitive patterns stabilized by hydrogen bonds in polypeptides. Their 
conventional names and the corresponding values of the <fi and ip angles have 
been taken from ref . 88 and are shown in table [7] 

In fig. 31 the relative energies of these conformations are shown for the three 
relevant potentials, V s , F s and F T , at both MP2/6-31++G(d,p) and HF/6- 
31++G(d,p) levels of the theory. Since the antiparallel /3-sheet is the structure 
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Figure 4: Relative energies of some important elements of secondary structure for the 
three potentials Vs, F s and F T , in the model dipeptide HCO-L-Ala-NHb and at both 
MP2/6-31++G(d,p) and HF/6-31++G(d,p) levels of the theory. The energy of the 
antiparallel /3-sheet has been taken as reference. The units are kcal/mol. 



with the minimum energy in all the cases, it has been set as the reference and 
the rest of energies in the figure should be regarded as relative to it. 

The meaningful assessment, using the statistical distance described above, 
of the typical error made in the energy differences has been also performed on 
this new working set of conformations. The results are presented in table [H] 

The distances between the free energies, F s and F T , and their corresponding 
approximations obtained dropping the correcting entropies, —TS^, —TS^ and 
—TSf, or the Fixman's compensating potential Vp, in the first two blocks of 
the table, are consistently smaller than the ones found in the study of the grid 
defined in the whole Ramachandran space (cf. tabled]). And so are the distances 
between the three relevant potentials, Vs, F s and F T , calculated at the MP2/6- 
31++G(d,p) and HF/6-31++G(d,p) levels of the theory. 

Although the distance d\i used is a statistical quantity and, therefore, one 
must be cautious when working with such a small set of conformations (of size 
six, in this case), the conclusion drawn from this second part of the study is 
that, if one is interested only in the "lower region" of the Ramachandran surface, 
where the typical secondary structure elements lie, then, one may safely neglect 
the conformational dependence of the different correcting terms appearing in the 
study of the constrained equilibrium of peptides. At least, up to oligopeptides 
(poly-alanines) of ~ 10 residues in the worst case (the neglect of the Fixman's 
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Corr.° V 1 b V 2 C d 12 d N rcs e b 12 f r 12 <> 



MP2/6-31++G(d,p) 



-TS* - TS c a 
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Table 8: Comparison of different approximations to the energies of some important 
elements of secondary structure (see table [7} in the study of the constrained equilib- 
rium of the protected dipeptide HCO-L-Ala-NHb. See the caption of table [5] for an 
explanation of the keys in the different columns. 

compensating potential Vp in the F s vs. F r comparison at MP2/6-31++G(d,p)). 

This difference between the two working set of conformations may be ex- 
plained looking at one of the ways of expressing the statistical distance used 
(see eq. 12a in ref. 82): 

di 2 = V2a 2 (l-ry x /2 j (4 . 2 ) 

where r\ 2 is the Pearson's correlation coefficient between the potential ener- 
gies denoted by V\ and V 2 and <r 2 is the standard deviation in the values of V 2 
on the relevant working set of conformations. 

This last quantity, <j 2 , is the responsible of the differences between tables [5] 
and[H since the set of conformations comprised by the six secondary structure 
elements in table[7]spans a smaller energy range than the whole Potential Energy 
Surface in fig. [2] (or F s , or _F r , which have very similar variations). Accordingly, 
the dispersion in the energy values is smaller: a 2 ~ 2 kcal/mol in the case of 
the secondary structure elements and <r 2 ~ 4 kcal/mol for the grid in the whole 
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Ramachandran space (see tableS]). Since the correlation coefficient in both cases 
are of similar magnitude, the differences in 02 produce a smaller distance d\2 
for the second set of conformations studied, i.e., a smaller typical error made 
in the energy differences when omitting the correcting terms derived from the 
consideration of constraints. 

To end this section, we remark that, although this "lower region" of the 
Ramachandran space contains the most relevant secondary structure elements 
(which are also the most commonly found in experimentally resolved native 
structures of proteins [89-92]) and may be the only region explored in the dy- 
namical or thermodynamical study of small peptides, if the aim is the design 
of effective potentials for computer simulation of polypeptides [68-70], then, 
some caution is recommended, since long-range interactions in the sequence 
may temporarily compensate local energy penalizations and the higher regions 
of the energy surfaces studied could be important in transition states or in some 
relevant dynamical paths of the system. 

In the following section, the many results discussed in the preceding para- 
graphs are summarized. 

5 Conclusions 

In this work, the theory of classical constrained equilibrium has been collected 
for the stiff and rigid models. The pertinent correcting terms, which may be 
regarded as effective entropies, as well as the Fixman's compensating potential, 
have been derived and theoretically discussed (see eqs. 12.121 12.231 and 12.251 
together with the formulae in sec. l3.ip . In addition, the common approximation 
of considering that, for typical internals, the equilibrium values of the hard 
coordinates do not depend on the soft ones, has also been discussed and related 
to the rest of simplifications. The treatment of the assumptions in the literature 
is thoroughly reviewed and discussed in the Appendix. 

In the central part of the work (sec.|l|), the relevance of the different correct- 
ing terms has been assessed in the case of the model dipeptidc HCO-L-Ala-NH2, 
with quantum mechanical calculations including electron correlation. Also, the 
possibility of performing analogous studies at the less demanding Hartree-Fock 
level of the the theory has been investigated. The results found are summarized 
in the following points: 

• In Monte Carlo simulations of the classical stiff model at room tem- 
perature, the effective entropy —TS^, associated to the determinant of 
the mass-metric tensor G, may be neglected for peptides of up to ~ 80 
residues. Its maximum variation in the Ramachandran space is 0.24 
kcal/mol. 

• In Monte Carlo simulations of the classical stiff model at room temper- 
ature, the effective entropy —TS^, associated to the determinant of the 
Hessian H of the constraining part of the potential, should be included 
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for peptides of more than 2 residues. Its maximum variation in the Ra- 
machandran space is 1.67 kcal/mol. 

• In Monte Carlo simulations of the classical rigid model at room tem- 
perature, the effective entropy —TS^, associated to the determinant of 
the reduced mass-metric tensor g, may be neglected for peptides of up 
to ~ 12 residues. Its maximum variation in the Ramachandran space is 
0.81 kcal/mol. 

• In rigid Molecular Dynamics simulations intended to yield the stiff equilib- 
rium distribution at room temperature, the Fixman's compensating po- 
tential Vf should be included for peptides of more than 2 residues. Its 
maximum variation in the Ramachandran space is 1.68 kcal/mol. 

• If the assumption that only the more stable region of the Ramachandran 
space, where the principal elements of secondary structure lie, is relevant, 
then, the importance of the correcting terms decreases and the limiting 
number of residues in a polypeptide potential up to which they may be 
omitted is approximately four times larger in each of the previous points. 

• In both cases (i.e., either if the whole Ramachandran space is considered 
relevant, or only the lower region), the errors made if the most important 
correcting terms are neglected are of the same order of magnitude as the 
errors due to a decrease in the level of theory from MP2/6-31++G(d,p) 
to HF/6-31++G(d,p). 

• The whole study of the relevance of the different correcting terms ( or future 
analogous investigations) may be performed at the HF/6-31-1 — hG(d,p) level 
of the theory, yielding very similar results to the ones obtained at MP2/6- 
31++G(d,p) and using a tenth of the computational effort. 

To end this discussion, some qualifications should be made. On one hand, 
the conclusions above refer to the case in which a classical potential directly 
extracted from the quantum mechanical (Born-Oppcnheimcr) one is used; for the 
considerably simpler force fields typically used for macromolecular simulations, 
the study should be repeated and different results may be obtained. On the 
other hand, the investigation performed in this work has been done in one of 
the simplest dipeptides; both its isolated character and the relatively small size 
of its side chain play a role in the results obtained. Hence, for bulkier residues 
included in polypeptides, these conclusions should be approached with caution 
and much interesting work remains to be done. 
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Appendix 

Many approximations may be done to simplify the calculation of the different 
correcting terms introduced in the previous subsections. The most frequently 
found in the literature are the following three: 

(i) To neglect the conformational dependence of det G. 

(ii) To neglect the conformational dependence of det H . 

(hi) To assume that the hard coordinates are constant, i.e, that the f I (q l ) in 
eq. 12.11 do not depend on the soft coordinates q % . 

The conformational dependence of det g is customarily regarded as important 
since it was shown to be non-negligible even for simple systems some decades 
ago [13,21-23] (normally in an indirect way, while studying the influence of the 
Fixman's compensating potential in cq. 12.251 sec discussion below). With this 
same aim, Patriciu et al. [20] have very recently measured the conformational 
dependence of det g for a serial polymer with fixed bond lengths and bond angles 
(in the approximation (hi)), showing that it is non- negligible and suggesting that 
it may be so also for more general systems. 

Note that, if approximations (i) and (ii) are assumed, then the Fixman's po- 
tential depends only on det g. In fact, whereas in the general case the Fixman's 
compensating potential cannot be simplified beyond the expression in eq. 12.251 
if one assumes approximation (iii) , then the reduced mass-metric tensor g turns 
out to be the subblock of G with soft indices and, in this case, the quotient 
detG/ det g has been shown to be equal to 1/det/i by Fixman [16], where h 
denotes the subblock of G _1 with hard indices, i.e., 



This result has been extensively used in the literature [21-23,38,42,57], since 
each of the internal coordinates q a typically used in macromolecular simulations 
only involves a small number of atoms, thus rendering the matrix h above sparse 
and allowing for efficient algorithms to be used in order to find its determinant. 

Now, although detg is customarily regarded as important, the conforma- 
tional variations of det G are almost unanimously neglected (approximation (i)) 
in the literature [15, 55] and may only be said to be indirectly included in h by 




N 



(5.1) 



27 



the authors that use the expression above [20-23,38,57]. This is mainly due 
to the fact, reported by Go and Scheraga [15] and, before, by Volkenstein [74], 
that det G in a serial polymer may be expressed as in eq. 13.11 being independent 
of the dihedral angles (which are customarily taken as the soft coordinates). If 
one also assumes approximation (iii), which, as will be discussed later, is very 
common, then det G is a constant for every conformation of the molecule. 

Probably due to computational considerations, but also sometimes to the use 
of a formulation of the stiff case based on delta functions [51] , the conformational 
dependence of det H is almost unanimously neglected (approximation (ii)) in the 
literature [15, 16, 19,20,41,55,65,66]. Only a few authors include this term in 
different stages of the reasoning [13-15,18,19,26,39], most of them only to argue 
later that it is negligible. 

Although for some simple ad hoc designed potentials that lack long-range 
terms [21,22,57], the aforementioned simplifying assumptions and the ones that 
will be discussed in the following paragraphs may be exactly fulfilled, in the 
case of the potential energies used in force fields for macromolccular simulation 
[6,27-37], they are not. The typical energy function in this case, has the form 

, N r N e 

a—1 a—1 
+ Vr S (0a) + < nS " rangC (9 a ) , (5.2) 

where r a are bond lengths, 8 a are bond angles, (j) a are dihedral angles and, 
for the sake of simplicity, no harmonic terms have been assumed for out-of- 
plane angles or for hard dihedrals (such as the peptide bond ui). N r is the 
number of bond lengths, Ng the number of bond angles and the quantities K ra , 
Kg a , r° and 9® are constants. The term denoted by Vfj ors (<^> a ) is a commonly 
included torsional potential that depends only on the dihedral angles <j) a and 
y f km g " lange (g a ) norma jiy comprises long-range interactions such as Coulomb or 
van der Waals; hence, it depends on the atomic positions x' a which, in turn, 
depend on all the internal coordinates q a . 

One of the reasons given for neglecting detH, when classical force fields 
are used with potential energy functions such as the one in eq. 15. 2[ is that 
the harmonic constraining terms dominate over the rest of interactions and, 
since the constants appearing on these terms (the K Ta , Kg a in eq. 15. 2j) are in- 
dependent of the conformation by construction, so is detH [15,19,26]. Here, 
we analyze a more realistic quantum-mechanical potential and these consider- 
ations are not applicable, however, they also should be checked in the case of 
classical force fields, since, for a potential energy such as the one in eq. 15. 2p . 
the quantities K Ta and Kg a are finite and the long-range terms will also af- 
fect the Hessian at each point of the constrained hypersurface E, rendering its 
determinant conformation- dependent. 

For the same reason, even in classical force fields, the equilibrium values of 
the hard coordinates are not the constant quantities r° and 0® in eq. 15.21 but 
some functions f I (q l ) of the soft coordinates (see eq. l2.1|) . This fact, recognized 
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by some authors [15,25,45,93], provokes that, if one chooses to assume approx- 
imation (iii) and the constants r° and 0® appearing in eq. 15.21 are designated as 
the equilibrium values, the potential energy in £ may be heavily distorted, the 
cause being simply that the long-range interactions between atoms separated 
by three covalent bonds are not fully relaxed [93] . This effect is probably larger 
if bond angles, and not only bond lengths, are also constrained, which may par- 
tially explain the different dynamical behaviour found in ref. 6 when comparing 
these types of constraints in Molecular Dynamics simulations. In quantum me- 
chanical calculations of small dipeptides, on the other hand, the fact that the 
bond lengths and bond angles depend on the Ramachandran angles ip) has 
been pointed out by Schaffer et al. [94]. Therefore, approximation (iii), which is 
very common in the literature [6,13,15,16,18-20,26,38-42,51,52,55,65,66,95,96], 
should be critically analyzed in each particular case. 

Apart from the typical internal coordinates q a used until now, in terms of 
which the constrained hypersurface £ is described by the relations q 1 = f I {q l ) 
in eq. 12. 1[ with I = M + 7, . . . , N, one may define a different set Q a such that, 
on S, the corresponding hard coordinates are arbitrary constants Q 1 = C 1 (the 
external coordinates q A and Q A are irrelevant for this part of the discussion). 
To do this, for example, let 

Q i := q i i = 7,...,M + 6 and , . 

Q 1 ■= q 1 - f 1 + c 1 i = m + 7,...,n . ( ' 

Well then, while the relation between bond lengths, bond angles and dihedral 
angles (the typical q a [59]) and the Euclidean coordinates is straightforward 
and simple, the expression of the transformation functions Q a (x^) needs the 
knowledge of the f 1 , which must be calculated numerically in most real cases. 
This drastically reduce the practical use of the Q a , however, it is also true that 
they are conceptually appealing, since they have a property that closely match 
our intuition about what the soft and hard coordinates should be (namely, that 
the hard coordinates Q 1 are constant on the relevant hypersurface E); and this 
is why we term them exactly separable hard and soft coordinates. Now, we 
must also point out that, although the real internal coordinates q a do not have 
this property, they are usually close to it. The customary labeling of soft and 
hard coordinates in the literature is based on this circumstance. Somehow, the 
dihedral angles are the "softest" of the internal coordinates, i.e., the ones that 
"vary the most" when the system visits different regions of the hypersurface 
S; and this is why we term the real q a approximately separable hard and soft 
coordinates, considering approximation (iii) as a useful reference case. 

To sum up, the three simplifying assumptions (i), (ii) and (iii) in the begin- 
ning of this section should be regarded as approximations in the case of classical 
force fields, as well as in the case of the more realistic quantum-mechanical po- 
tential investigated in this work, and they should be critically assessed in the 
systems of interest. Here, while studying the model dipeptide HCO-L-Ala-NH2, 
no simplifying assumptions of this type have been made. 
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